- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources4
- Resource Type
-
0001000003000000
- More
- Availability
-
40
- Author / Contributor
- Filter by Author / Creator
-
-
Ivanov, Ivan G. (2)
-
Al-Temimy, Ameer (1)
-
Anderson, Christopher P. (1)
-
Armakavicius, Nerijus (1)
-
Awschalom, David D. (1)
-
Babin, Charles (1)
-
Baeza, Jaime (1)
-
Basharat, Ahmed (1)
-
Bouhafs, Chamseddine (1)
-
Bourassa, Alexandre (1)
-
Cannon, Carolyn L. (1)
-
Chapman, Morgan J. (1)
-
Chen, Qingquan (1)
-
Chirra, Bhagath (1)
-
Coletti, Camilla (1)
-
Darakchieva, Vanya (1)
-
Doerfert, Johannes (1)
-
Domke, Jens (1)
-
Endo, Toshio (1)
-
Eriksson, Jens (1)
-
- Filter by Editor
-
-
NA (1)
-
null (1)
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
NA (Ed.)While parallelism remains the main source of performance,architectural implementations and programming modelschange with each new hardware generation, often leadingto costly application re-engineering. Most tools for perfor-mance portability require manual and costly application port-ing to yet another programming model.We propose an alternative approach that automaticallytranslates programs written in one programming model(CUDA), into another (CPU threads) based on Polygeist/MLIR.Our approach includes a representation of parallel constructsthat allows conventional compiler transformations to ap-ply transparently and without modification a nd enablesparallelism-specific optimizations. We evaluate our frame-work by transpiling and optimizing the CUDA Rodinia bench-mark suite for a multi-core CPU and achieve a 58% geomeanspeedup over handwritten OpenMP code. Further, we showhow CUDA kernels from PyTorch can efficiently run andscale on the CPU-only Supercomputer Fugaku without userintervention. Our PyTorch compatibility layer making use oftranspiled CUDA PyTorch kernels outperforms the PyTorchCPU native backend by 2.7×.more » « less
-
Armakavicius, Nerijus; Kühne, Philipp; Eriksson, Jens; Bouhafs, Chamseddine; Stanishev, Vallery; Ivanov, Ivan G.; Yakimova, Rositsa; Zakharov, Alexei A.; Al-Temimy, Ameer; Coletti, Camilla; et al (, Carbon)null (Ed.)
-
Son, Nguyen T.; Anderson, Christopher P.; Bourassa, Alexandre; Miao, Kevin C.; Babin, Charles; Widmann, Matthias; Niethammer, Matthias; Ul Hassan, Jawad; Morioka, Naoya; Ivanov, Ivan G.; et al (, Applied Physics Letters)
-
Singla, Akshi; Simbassa, Sabona B.; Chirra, Bhagath; Gairola, Anirudh; Southerland, Marie R.; Shah, Kush N.; Rose, Robert E.; Chen, Qingquan; Basharat, Ahmed; Baeza, Jaime; et al (, ACS Applied Materials & Interfaces)
An official website of the United States government
